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ABSTRACT 



This paper looks at learner- formulated questions in 
technology- supported learning applications. Traditionally, 

technology- supported learning applications request input from the learner. 

The learner's response is used to assess the knowledge of the learner, to 
define a navigation path through the material or to construct a learner 
model. The objective is to add another form of interaction between learner 
and system, where the learner can pose questions to the system in similar 
fashion as to a human tutor. The paper discusses ways of dealing with these 
learner-formulated questions, question formats and existing approaches. It 
then introduces an approach for learner- formulated questions that is based on 
the Flexible Structured Coding Language (FSCL) . Two specific approaches are 
presented: the syntax-based and the semantic-based. After a discussion of 
these approaches, the paper concludes with an outline of future work. 
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Abstract: In this paper we look at learner- formulated questions in technology-supported 
learning applications. Traditionally, technology-supported learning applications request input 
from the learner. The learner’s response is used to assess the knowledge of the learner, to 
define a navigation path through the material or to construct a learner model. With our work we 
want to add another form of interaction between learner and system, where the learner can pose 
questions to the system in similar fashion as to a human tutor. The paper discusses ways of 
dealing with these learner-formulated questions, question formats and existing approaches. It 
then introduces our approach for learner-formulated questions which is based on the Flexible 
Structured Coding Language, FSCL. We present two specific approaches, the syntax-based and 
the semantic-based approach. After a discussion of these approaches we conclude the paper 
with an outline of future work. 
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1 Introduction 

This project is looking at learner-formulated questions in the context of online technology-supported learning 
applications. We focus on two main goals: 

• Learners should have the possibility to pose questions to a technology-supported learning application. 

• These questions should be formulated in the language of the learner. 

Technology-supported learning applications traditionally have features in which the application is requesting 
input from the learner in the form of multiple choice tests or yes/no questions. The response of the learner is used 
to assess the knowledge of the learner, to define a navigation path through the material or to construct a learner 
model. We have no intention of replacing these features. Instead we want to add another form of interaction 
between learner and technology- supported learning application. We want to give the learner the possibility of 
posing questions to the computer system in a very similar way as asking questions of a human tutor. Instead of 
being only reactive, the learners can actively request information from the system, independently of the confines 
of the system. 

To achieve our goals we have to define a format in which the users can formulate questions. If we want to mirror 
the student-tutor situation we need to have a format which is ‘natural’ for the learner. Ideally we would use 
natural language which is our natural form of communication, is expressive and flexible, and is familiar to the 
learners. 

Once we allow the learner to pose questions to the system in a ‘human understandable’ format, the system faces 
the task of responding to these questions. Before looking at the different possibilities for question formats we 
want to describe the various possibilities for interaction between learner, technology-supported learning 
application and human tutor: 

• In a first scenario the learner formulates a question which is linked to a specific position in the teaching 
material. The question is then submitted directly to a human tutor. The tutor provides the answer which is stored 
together with the question for pursual by the learner and for future reference by other learners. 

• In the second scenario again the learner formulates a question which is attached to a specific position in the 
teaching material. This time, the question is not directly transmitted to the human tutor. The computer system 
analyses the question first for syntactic and semantic characteristics. It compares the question against previously 
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asked questions. If a semantically equivalent question and answer set is available, this set is returned to the user 
without intervention of the human tutor. If only a semantically close question is stored (or no related question at 
all), this question is passed on the human tutor. The tutor assesses the suitability of the semantically close 
question and answer set, edits where necessary and proceeds as in scenario one. 

• The third alternative is parallel to the second scenario for posing the question and comparing against 
previously asked questions. Yet instead of passing an only semantically close or a new question to a human 
tutor, the system generates an answer. The question is matched against a knowledge repository, a suitable answer 
is generated and returned to the learner. In this scenario the question - answer dialog is performed without 
involvement of a human tutor. 

The work we are describing in this paper addresses the second scenario. The approach we use for formulating the 
questions is compatible with a related content representation technique, which should allow us to move into the 
third scenario. In the next section we discuss related approaches to user querying before we introduce our 
techniques. 

2 Review of Related Work 

In the first scenario we have described, the questions are evaluated solely by the human tutor. The system assists 
the dialogue only by transmitting questions and answers between learner and tutor and by keeping a repository of 
previously asked and answered questions. The idea of collecting questions/answers and making them available to 
later users was employed in the context of a portal server for the first time in the medical prototype portal 
"Infomed-Austria" (Maurer et al., 1999, www.infomed-austria.at). In the context of technology-supported 
learning applications this approach has been implemented in GENTLE (Dietinger et al. 1998 and 1999, 
www.gentle-wbt.com), using natural language for questions and answers. 

To move to scenarios two and three we need to have a ‘computer understandable’ format for the questions. 
Searching through documents on a keyword basis is not semantically rich enough (Davenport, 1996). Beside its 
semantic restrictions, a keyword mechanism is not the desired type of interaction for our work as we attempt to 
mirror a human-to-human communication. Natural language processing faces a range of problems like lexical 
ambiguity, ambiguous sentence structures or context dependencies, which yet have to be overcome (Smeaton, 
1997, Sowa, 2000). 

The AskJeeves approach (Basch, 1999; www.askjeeves.com) allows the input of any natural language question. 
It uses a proprietary parsing technology to interpret user queries. The queries are matched against templates and 
a knowledge base containing millions of previously asked and answered questions. The answers to questions are 
produced by human editiorial staff (Basch, 1999; Chowdhury, 1999). AskJeeves returns to the user previously 
asked questions with related meaning or keywords. Selecting one of these questions leads to web pages 
providing answers. The repertoire and understanding of AskJeeves is very impressive yet for our needs not 
specific enough. In contrast with the AskJeeves context we work within specific domains where we want to 
allow the users to enquire not mainly about terms but about cognitively complex concepts. The following two 
examples illustrate these points: 

• What is the difference between MPEG and JPEG? 

AskJeeves focuses both on the domain specific keywords contained in the question and on question generic 
terms. It returns questions relating to ‘MPEG’ or ‘JPEG’ (regarding the understanding of these or other computer 
related terms; regarding the download of MPEG music files) and to comparisons (yet not for MPEG and JEPG 
but for internet shopping sites). In our context we need to provide a much more specific answer which directly 
relates to a comparison between MPEG and JPEG formats. 

• Can JPEG files be converted into MPEG files? 

For this question AskJeeves gives some of the same answers as for the previous question (as links to downloads 
of music files). Additionally, it offers reference to conversion programs (relating to Macintosh and PC 
platforms). Again, we need to be more specific and want to provide different, more relevant answers to both 
example questions. 

3 Learner-Formulated Questions Based on FSCL 

As outlined above we want to provide a mechanism in which learners can pose questions to a computer system 
in a user-friendly, ‘natural’ way, similar to addressing a human tutor. We want the computer system to 
‘understand’ these questions, to compare these questions to a database of previously asked and answered 
questions, and to respond with semantically equivalent or close question and answer sets. We base our approach 
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on the Flexible Structured Coding Language, FSCL (Heinrich et al., 1999). FSCL is natural language like, has a 
flexible, user-extendable vocabulary, which is arranged in categories, and a fixed grammar based on these 
categories. FSCL offers a subset of natural language, which is rich enough to express complex content and 
structured enough to be accessible to automated processing. 

To develop a ‘natural language-like question system’ we first collected natural language questions and reviewed 
the grammatical structure of English language questions. We extracted the most important question structures 
and investigated how we could formalise these structures based on the FSCL coding language. We developed 
two parallel approaches, one syntax-based and one semantic-based. In the following sections we outline this 
work in more detail. 

3.1 English Language Question Structures 

We have analysed a set of 500 sample questions and studied books on English grammar (Wardhaugh, 1995, 
Klammer et al. 1992, Alexander, 1988). Within the space of this paper we can only very briefly describe the 
main question structures we have identified. For the discussion of these question structures we need to define 
some key terms: 

. • Auxiliary verb: a word used in conjunction with verbs (‘do’, ‘have’, ‘may’, ...). 

• NounPhrase: a word phrase of one or more nouns (or pronouns) which can contain adjectives and 
conjunctions (‘a good compression format like MPEG’). 

• ActionPhrase: one or more predicates where a predicate consists of a verb (including auxiliaries and 
adverbs) plus a noun phrase as its object (‘... is playing MPEG video’). 

• WhPhrase: an interrogative word (‘who’, ‘where’, ‘what’, ...) phrase which can contain a noun phrase 
(‘how many compression formats ...’). 

• AdditionalClause: a declarative sentence introduced by a conjunction (‘... before the application is playing 
the MPEG video’). 

We now use these terms to describe some of the question structures (and give examples in Figure 1). Our first 
group of questions, the general yes/no questions, have two basic forms (the brackets denote an optional part): 

(A) <auxiliary verb><nounPhrase><actionPhrase>[<additionalClause>] 

(B) <auxiliary verb><nounPhrase><object>[<additionalClause>] 

The second group of questions we want to introduce are ‘wh-questions’. These questions start with an 
interrogative word. They differ from a yes/no question by asking for missing information rather than simply 
requesting confirmation or denial of information supplied in the question. Some common formats of wh- 
questions are: 

(C) <whPhrase><actionPhrase>[<additionalClause>] 

(D) <whPhrase><auxiliary verb><nounPhrase>[<additionalClause>] 

(E) <whPhrase><auxiliary verb><nounPhrase><actionPhrase>[<additionalClause>] 

We regard imperative declarative sentences as a third group of questions. While not strictly speaking questions 
these sentences are useful in our context as they elicit information: 

(F) <verb><object> 

3.2 Principles of a FSCL-Based Question System 

FSCL is built on a number of principles which we have adopted and partly modified for this work. 

• Vocabulary: The FSCL vocabulary is fully defined by the FSCL user. Usually, words are taken from the 
specific application domain the user works in. The words have to be associated with the FSCL categories. They 
can be arranged in hierarchies to facilitate retrieval of description sentences and are stored in a database. The 
FSCL question system retains these core features with one exception. In the question system we distinguish 
between application domain-specific (as in the original FSCL) and question-specific vocabulary. Analysing our 
sample questions and the English language grammar we can identify a range of words which are question- 
specific, regardless of the application domain. Examples are the question words ‘where’, ‘what’ or ‘which’. As a 
consequence we predefine these words in the question system vocabulary. 

• Categories: The categories are an important concept in FSCL as they allow the definition of a fixed grammar 
without pre-determining the vocabulary. We retain the category idea but add question specific categories for 
reasons as outlined in the vocabulary section. 

• Grammar: Again we retain the FSCL approach of defining a LL(2)-type grammar based on the category 
identifiers as terminal symbols. 
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Type (A): 


‘Was the video started before the player was realised?’ 




‘Was’ 


‘the video’ 


‘started’ 


‘before the player was realised’ 




<auxiliary vcrb> 


<nounPhrase> 


<actionPhrase> 


<additionalClause> 


Type (B): 


The question ‘Is MPEG a new technology?’ 








‘Is’ 


‘MPEG’ 


‘a new technology’ 






<auxiliary verb> 


<nounPhrase> 


<objcct> 




Type (C): 


‘What happens when the application starts the player?’ 






‘What’ 


‘happens’ 


‘when the application starts the player’ 




<whPhrase> 


<actionPhrase> 


<additionalClause> 




Type (D): 


‘What is Java?' 
‘What’ 


‘is’ 


‘Java’ 






<whPhrase> 


<auxiliary verb> 


<nounPhrase> 




Type (E): 


‘When was MPEG introduced?’ 






‘When’ 


‘was’ 


‘MPEG’ 


‘introduced’ 




<whPhrase> 


<auxiliary verb> 


<nounPhrase> 


<actionPhrase> 


Typc(F): 


‘Give two examples for compression formats!’ 






‘Give’ 


‘two examples for compression formats’ 






<vcrb> 


<objcct> 






Figure 1: 


Examples for the basic question formats 







• Semantic tree: FSCL description sentences are parsed and then stored in semantic tree format. This format is 
comparable to abstract syntax trees as found in literature on compiling techniques. The semantic trees don’t give 
the parse structure but the semantic structure of a sentence by displaying the subject - verb - object relationships 
within the sentence. The question system uses semantic trees to represent the question structures and the 
semantic trees are the basis for comparison of questions. The comparison of two semantic trees contains two 
interwoven processes. A comparison of the tree structures looks for the existence of the same sentence parts at 
equivalent positions in each tree. The actual words in these parts are compared for identity using features of the 
vocabulary definition like the word hierarchies. 

3.3 Syntax-Based Approach 

The syntax-based approach for our question system follows closely the grammar structures given in section 3.1. 
The main focus is to capture the syntax of these questions. The original FSCL categories have been retained and 
two new word categories, ‘W- WhWord’ and ‘U - Auxiliary’, have been added. The new grammar builds on 
the original FSCL grammar. The original grammar was designed for declarative sentences and is now used for 
the additional clauses. New grammar rules are introduced to deal with the question specific structures (like 
question word phrases) and newly introduced elements (like action and noun phrases). The semantic tree for the 
syntax-based approach has nine levels. Figure 2 shows two questions in their semantic tree representation. 

The syntax-based approach has the advantage that it allows the recognition of a wide range of simple and 
relatively complex question structures. The comparison of different questions is prepared. The semantic tree 
format identifies the role of every word in a question sentence. As in the original FSCL approach, two questions 
can be compared by comparing their tree structures and their words in equivalent tree levels and positions. 

3.4 Semantic-Based Approach 

The semantic-based approach focuses on the meaning of questions. We looked through our sample questions and 
grouped together questions which ask for the same or closely related concepts. These questions can have 
different syntactical structures while being semantically similar: 

‘In terms of compression which of MPEG or JPEG is better?’ 

‘Why is MPEG better than JPEG?’ 

These questions can have equivalent syntactical structure but use different question-specific vocabulary to ask 
for semantically equivalent concepts: 

‘How is MPEG similar to or different from JPEG?’ 

‘How does MPEG compare with JPEG?’ 
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Q1 



What 


V 


1 question word level 






\ li 


» 2 question auxiliary level 






V N 


4 PEG 3 question subject level 


hapf 


>ens 


V 


4 question verb level 




\ a 


new technology 5 question object-like level 



when the application 6 FSCL actor/concept level 

I * 



starts 7 FSCL activity level 

I 

the player 8 FSCL object level 



Figure 2: Semantic trees for example questions using syntax-based approach 



Using this approach we have identified twelve generic question types. For each question type semantically 
equivalent and close question variations have been defined. The question types cover the basic formats of yes/no, 
interrogative word and imperative statement questions as described in Section 3.1. Again we follow the principle 
idea behind FSCL of using word categories as the basis for the grammar definition, yet the original FSCL is not 
integrated as closely as in the syntax-based approach. Among the category definitions are three generic 
categories (for nouns, verbs and adjectives) to contain the domain vocabulary. The other categories represent the 
components of a question specific vocabulary. For example, the category ‘whatWord’ contains (at least at this 
stage in the project) only the word ‘what’. Another interrogative word like ‘which’ finds its place in its own 
category ‘which Word’. This is different to the syntax-based approach where all interrogative words belong to 
just one category of ‘Wh Words’, as they all take the same place regarding the syntax of a question. Another 
category in the semantic-based approach is ‘greaterWord’ which contains a collection of comparison words with 
similar meaning (‘better than’, ‘greater than’, ‘better’, ‘better between’). As in the other approach, a LL(2) 
grammar is defined on the category identifiers. The semantic tree spans seven levels. Figure 3 gives examples of 
semantic trees for two of the generic question types. 

The strength of the semantic-based approach lies in grouping questions of similar or close meaning. Questions 
within a group are translated into the same semantic tree structure which indicates their relatedness. The 
comparison of trees is simplified. Compared to the syntax-based approach the range of question structures 
covered and the power of expressing domain concepts is less at this stage. 

3.5 Prototype Systems 

We have implemented prototype systems for both approaches in Java. The prototypes contain the predefined 
question-specific vocabulary and they allow the users to enter their own domain specific vocabulary. Based on 
the vocabulary, the users can formulate questions. These questions are checked for syntax and, if syntactically 
correct, stored in semantic tree format in a database. Any new question is compared to other questions stored 
already in the database. The user is presented with the semantic tree format of the new questions and a listing of 
any relevant existing question. 

4 Conclusion and Further Work 

We have achieved the first steps towards learner-formulated questions in online technology- sup ported learning 
applications. Based on a review of sample questions and English language grammar we have identified the main 
question structures used. We then followed both syntactic and semantic approaches to formalise these question 
structures. These approaches are based on the Flexible Structured Coding Language, FSCL. We use question 
specific categories of words and generic categories, which can be filled by the user with vocabulary for their 
specific domains. Grammars are defined based on categories and allow parsing for the structure of the questions. 
Syntactically-correct questions are stored in semantic tree formats to facilitate the comparison of questions for 
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1 query level 



Ql 

I 

whyPhrase 
Why is 



greaterPhrase 
(better than) 



dConcept dConcept 

(MPEG) (JPEG) 



_ 2 _ 

whichPhrase 
which of 



termsPhrase 
(In terms of) 

dConcept 

(compression) 

greaterPhrase 
(is better) 



dConcept conjunction 
(MPEG) (or) 



dConcept 

(JPEG) 



2 terms level 



3 termsConcept level 



4 compare level 



5 subject level 



6 activity level 



7 object level 



Figure 3: Semantic trees for example questions using semantic-based approach 



semantic equivalence and closeness. The questions we deal with are natural language like. We have implemented 
prototype systems to allow for testing of our ideas. 

The next step in our work will be to combine our syntax-based and semantic-based approaches. We want to be 
able to deal with the variety and complexity of question structures from the syntax-based approach and have the 
grouping of questions for semantic relatedness from the semantic-based approach. We then need to collect a 
substantial body of questions for testing our question structures and measures of relatedness. In the longer term 
we want to move towards the third scenario we have introduced where answers to questions can be deduced 
from domain -specific content representations using FSCL. 
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